MNIST Dataset

Also known as `digits` if you're familiar with `sklearn`:

from sklearn.datasets import digits

Problem Definition

Recognize handwritten digits

Data

The MNIST database (link) has a database of handwritten digits.

The training set has $60,000$ samples. The test set has $10,000$ samples.

The digits are size-normalized and centered in a fixed-size image.

The data page has description on how the data was collected. It also has reports the benchmark of various algorithms on the test dataset.

Load the data

The data is available in the repo's data folder. Let's load that using the keras library.

For now, let's load the data and see how it looks.



In [ ]:

    
import numpy as np
import keras
from keras.datasets import mnist



In [ ]:

    
# Load the datasets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Basic data analysis on the dataset



In [ ]:

    
# What is the type of X_train?



In [ ]:

    
# What is the type of y_train?



In [ ]:

    
# Find number of observations in training data



In [ ]:

    
# Find number of observations in test data



In [ ]:

    
# Display first 2 records of X_train



In [ ]:

    
# Display the first 10 records of y_train



In [ ]:

    
# Find the number of observations for each digit in the y_train dataset



In [ ]:

    
# Find the number of observations for each digit in the y_test dataset



In [ ]:

    
# What is the dimension of X_train?. What does that mean?

Display Images

Let's now display some of the images and see how they look

We will be using matplotlib library for displaying the image



In [ ]:

    
from matplotlib import pyplot
import matplotlib as mpl
%matplotlib inline



In [ ]:

    
# Displaying the first training data



In [ ]:

    
fig = pyplot.figure()
ax = fig.add_subplot(1,1,1)
imgplot = ax.imshow(X_train[0], cmap=mpl.cm.Greys)
imgplot.set_interpolation('nearest')
ax.xaxis.set_ticks_position('top')
ax.yaxis.set_ticks_position('left')
pyplot.show()



In [ ]:

    
# Let's now display the 11th record



In [ ]: